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The effects of immediate knowledge of results (KR) 
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a computer-administered test of verbal ability were investigated. The 
effects of KR were examined on a 50-dtem conventional test and a 
stradaptive ability test and in high- and low- ability groups. The 
primary dependent variable was maximum likelihood ability estimates 
derived from the item responses. Results indicated that mean test 
scores for the High- Ability group receiving KR were higher than for 
the No-KR group on both the conventional and stradaptive tests. For 
Low-Ability examinees, mean scores were higher under KE conditions 
than under NO-KR conditions on both tests, but the difference was 
statistically significant only for the conventional test. However, 
the higher mean scores of the Low-Ability testees on the stradaptive 
test indicated that for low-ability examinees, adaptive testing had 
the same effects on test performance as did the provision of 
immediate KR. The results of the study were interpreted as indicating 
the potential of both immediate knowledge of results and adaptive 
testing procedures to increase the extent to which ability tests 
measure "maximum performance" levels. (Author) 
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the No-KR group on both the conventional and stradaptlve tests. For Low- 
Ablllty examinees, mean scores were higher under KR conditions than under No- 
KR conditions on both tests, but the difference was statistically significant 
only for the conventional test. However, the higher mean scores of the Low- 
Ablllty testees on the stradaptlve test Indicated that for low-ablllty 
examinees, adaptive testing had the same effects on test performance as did 
the provision of Immediate KR, Knowledge of results did not have significant 
effects on either response latencies, response consistency on the 
stradaptlve test, or the Internal consistency reliability of the conventional 
test. .No significant score difference were found on a 44-ltem post-test 
administered without KR, Indicating that the facllltatlve effects of 
knowledge of results on test performance were confined to the test In which 
KR was provided. The results of the study were Interpreted as Indicating 
the potential of both Immediate knowledge of results and adaptive testing 
procedures to Increase the extent to which ability tests measure "maximum 
performance" levels. 
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Effects of Immediate Knowledge 
OF Results and Adaptive Testing 
ON Ability Test Performance 

The description of ability tests as measures of "maximum performance" 
(Cronbach, 1970) implies that such tests should reflect the highest level of 
performance of which a given individual is capable. According to Cronbach, 
the distinguishing feature of such tests "is that the subject is encouraged 
to earn the best score he can," (p. 35). Thus, to the extent that individuals 
do not perform to their fullest capabilities on an ability test, the measure- 
ment of those individuals* ability levels may be less accurate, and the 
predictive validity of the obtained scores may be reduced. 

It is reasonable to assume that examinees will perform to their fullest 
capabilities on an ability test only if they are motivated to do so. Accord- 
ing to Samuda (1975): 

A person who is being tested usually tries to do his best. There- 
fore, motivation is one of the a priori assumptions upon which 
tests are built. The great majority of available data leads to 
the observation that motivation has a determining effect upon level 
of performance. Thus differences in performance may be attributed, 
in part, to differences in motivation, (p. 82) 

The importance of maintaining examinee motivation at high levels was 
recognized in the early days of ability testing by the constructors of indi- 
vidually administered intelligence tests. Terman (1916) recommended that 
frequent praise was essential to maintain high levels of motivation in the 
administration of the Stanford-Binet test. Recent versions of the manuals for 
the Stanford-Binet (Terman & Merrill, 1960) and^-the WAIS (Wechsler, 1955) in- 
struct the examiner to give frequent praise and encouragement to the examinee. 
Thus, means of maintaining high examinee motivation have always been perceived 
as an important aspect of the administration of individual intelligence tests. 
However, such tests have also been characterized by wide differences among 
examiners in both administration and scoring (see Sattler & TSieye, 1967, and 
Weiss & Betz, 1973, for reviews of this literature). 

By the end of World War I, group testing had become the predominant 
means of measuring intelligence and abilities (DuBois, 1970). Group tests, 
while characterized by a very high degree of standardization and objectivity, 
had no provision for maintaining high levels of testee effort and motivation. 

The provision of immediate knowledge of results (KR) Is one means of 
possibly increasing motivation that can be Incorporated into group-adminis- 
tered paper-and-pencil tests. Immediate KR has a long history in the study of 
human learning and performance. In fact, the facilitative effect of KR in 
human learning is one of the best established findings in the research litera- 
ture (e.g., Ammons, 1956; Annett, 1961; 1969; Bilodeau & Bilodeau, 1961). 



Knowledge of results on classroom tests has been hypothesized to be important 
in motivating classroom achievement (e.g., Ross, 1933) and in facilitating 
learning and retention of learned material. Many studies have been concerned 
with the effects of various delays in returning test results to students on 
their subsequent classroom performance and achievement (e.g.. Brown, 1932; 
Kulhavy & Anderson, 1972; McMahon, 1973; Newman, Williams, & Hiller, 1974; 
Plowman & Stroud, 1942). Ammons (1956) and Annett (1969) have reviewed the 
literature on the effects of KR in experimental studies of learning while 
Annett (1969) and Geis and Chapman (1971) reviewed research relevant to the 
use of KR in programmed instruction. 

However, little published research has dealt with the effects of KR on 
performance on tests of intelligence or ability. Gi^ven the importance as- 
cribed to KR by researchers studying other aspects of human performance, it 
is surprising that so little attention has been directed toward the possible 
effects of immediate, on-going KR on individuals' demonstrations of their 
fullest intellectual capabilities, in the situation in which those capabilities 
are being assessed. 

Knowledge of Results in Group-Administered Tests 

Methods of providing KR . The earliest devices used to provide KR on 
objective multiple-choice or true-false tests were developed beginning in 1915 
by Pressey (1926). Pressey's interest, however, was in KR as a teaching de- 
vice rather than as a means for motivating high levels of test performance. 

These early devices developed by Pressey, called "mechanical instructors", 
soon were replaced by the Pressey (1950) punchboard. This device consisted of 
a top punchboard, slotted with as many holes as there were alternative 
answers to the questions, and a bottom punchboard with holes only for the 
correct answers. When the examinee punched a hole for the correct answer, an 
answer sheet lying between the two boards was perforated. As with the mechan- 
ical instructor, the examinee was required to select alternatives until the 
correct one was found. 

Pressey (1950) described a series of studies concerning the effective- 
ness of the punchboard in facilitating learning. These experiments, which 
contrasted the performance of examinees using punchboards with that of exam- 
. inees using standard answer sheets, indicated that the punchboards facilitated 
learning in terms of such criteria as direct and free recall of tested mate- 
rial. Pressey also found that students liked using the punchboards, came to 
depend on the immediate appraisals, and became frustrated when the later use 
of standard tests left them without knowledge of results. 

More recent examples of KR devices include Montor*s (1970) "Trainer- 
Tester", and Lord's (1971) "flexilevel" test. The Trainer-Tester uses answer 
sheets on which the testees erase the ink covering their response choices and 
thereby are informed immediately of the correctness of their responses. The 
Trainer-Tester requires the testee to continue to erase answers until the cor- 
rect one is chosen, thereby facilitating learning. 
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Lord's flexllevel test, on the other hand, was not originally intended 
as a KR device. Instead, it was an implementation of an adaptive or tailored 
test in a paper-and-pencil, rather than computer-administered, format. The 
flexllevel test utilizes a specially constructed answer sheet to facilitate 
the adaptive item-administration procedure. When examinees choose wrong an- 
swers, a red spot appears on the answer sheet. When a correct answer is 
chosen, a blue spot appears. The color of the spot directs the examinee to 
the next test item to be answered. 

While this format does not provide direct KR, testees are likely to 
realize early in the test that items that were easy for them were followed by 
the blue (correct) spot, while items that were difficult were followed by the 
red (incorrect) spot. This knowledge can then be generalized to item re- 
sponses about which they are unsure. To date there has been only one study 
in which the flexllevel test was actually administered by paper and pencil 
(Olivier, 1974), but the study was not concerned with the possible effects 
of the immediate KR that the examinees were probably receiving. 

Thjus, while methods for implementing KR during ability testing have 
long been available, most studies have been concerned with its effects on 
learning and retention. Few studies have examined the function of KR as an 
incentive enhancing the immediate performance of individuals on objective 
tests* The studies that do exist utilized classroom achievement tests rather 
than intelligence or ability tests. 

Effects of KR in achievement tests . A study by Bierbaum (1965) uti- 
lized a Pressey-type punchboard to study the effects of immediate KR on test 
performance. Two parallel classroom tests were administered to a class of 
23 students. On the first test, half the students received KR and the other 
half did not. On the second test, this condition was reversed so that those 
who initially had received KR did not, and vice versa. Results indicated 
that significantly more errors were made on the KR items than on the no-KR 
items, and this finding was similar in degree for both KR-first and KR-second 
groups. Further investigation revealed that students considered the KR con- 
dition to "put them under more pressure", and Bierbaum concluded that KR may 
cause increased anxiety. However, in this study students had to continue to 
select answers until they found the correct one; it is possible that this re- 
quirement enhanced the pressure they felt to choose the correct answer. 

Heald (1970) studied the effects of KR on achievement test performance 
and upon retention of learned material as measured by a retest after one week. 
In Heald 's study, two different KR conditions were contrasted with a control 
condition. In the "KR-Reference" condition, examinees were informed of the 
correctness of their responses. If an answer was incorrect they were referred 
immediately to the passage in the class text which addressed that item; follow- 
ing reference to the text, they were to respond a second time to the item. 
In the "KR-Alone" condition, examinees were informed whether or not they were 
correct and were required to continue responding until they answered the item 
correctly. The control condition utilized a standard answer sheet format. 
Fifty-four students in a graduate level course in educatiozial administration 
were tested on material relevant to the topic "audiovisual materials for 
teachers." Students were classified into high and low test anxiety groups on 
the basis of the Sarason Text Anxiety Scale and were assigned randomly to 
treatment conditions from within each anxiety group. 
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Results indicated that KR had significant effects on performance in 
both initial test and retest. The KR-Referenci^ condition led to higher test 
scores than did the KR-Alone condition, and both KR groups performed better 
than did students in the control condition on both tests. There were no 
significant differences in performance as a function of anxiety level, nor 
was there any significant interaction between KR cou4ition and anxiety level. 

In Beeson*s (1973) study, students were administered tests in which one 
half of the items were followed by immediate KR and the other half were given 
delayed, post-test, KR. Immediate KR was administered using an IBM card 
punchboard. Three groups of students, two college groups and one junior high 
school group, were studied, using mathematics achievement tests. Within each 
group, half the students received immediate KR on the first half of the items 
and delayed KR on the second half; the other half of the group received the 
KR conditions in reverse order. This procedure was followed on each of 10 
one-hour exams and-a final exam. The order of KR for each subgroup was 
counterbalanced so that no subject received immediate KR on the same half of 
the test in any two consecutive tests. 

Results indicated that there were no significant differences within any 
of the 10 one-hour exams, but that the performance on the immediate KR half 
of the test was significantly better (p<.05) on the final exam. In general, 
performance was better when students were given immediate KR, and Beeson 
attributes the significance of the difference found only on the final exam to 
the fact that it was a longer and more reliable test. 

Spencer and Barker (1969) studied the effects of immediate KR on reten- 
tion of learned material over a time interval. While amount retained was 
the major dependent variable of interest, this study also provides data rele- 
vant to the effects of KR on test performance itself. On the first test given 
(an achievement test in biology), one group received item-by-item feedback 
using a punchboard answer sheet, while the other group used a regular answer 
sheet. On the retest given 18 days later, all students used the regular an- 
swer sheet. It was found that the group using the punchboard scored signifi- 
cantly lower on the initial test than did the control group. But on the re- 
test the experimental group scored significantly higher than did the control 
group. 

One major problem in the study of KR in teacher-constructed tests has 
been the failure to control for the possibility that the KR received on one 
test item may provide the examinee with information concerning the correct an- 
swers to succeeding items. For example, in Heald^s (lSf70) study reporting the 
facultative effects of KR, the number of relevant cues provided by the KR 
easily could have accounted for its beneficial effects. Similarly, the retest 
difference in Spencer and Barbaras study could have resulted from the learning 
effects of KR. 

One study explicitly designed to separate these two effects was that by 
Strang and Rust (1973), who studied the effects of KR on an achievement test 
of course-related facts and their applications. The items were constructed so 
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that knowledge of results on one item would not provide clues to the answers 
of succeeding questions. Thus, the interest was solely in KR as a motiva- 
tional variable. In this study, both experimental and control groups were 
first administered a 25-item test under no-KR conditions. The students then 
were divided into four groups, resulting from a cross-classification of task 
definition (test vs, experimental exercise) and knowledge of results vs, 
no-KR, In the two KR groups, students indicated their answers by erasing 
one of five answer spots. If the answer was correct, a appeared, and if 
it was incorrect, the letter corresponding to the correct choice appeared. 
The results of the 2x2 analysis of covariance of the scores on the second 
25-item test (using scores on the first 25-item test as a covariate) indi- 
cated that students in the KR condition made significantly more errors. 
Additionally, students in the KR condition reported significantly more ner- 
vousness during testing than did students in the no-KR condition, Strang 
and Rust hypothesized that the increase in errors under KR may have been 
caused by the greater nervousness of the examinees. 

Knowledge of Results in Individually Administered Tests 

In addition to the studies of KR on classroom achievement tests, two 
studies have introduced KR into the administration of individual intelli- 
gence tests. Sweet and Rlngness (1971) administered the Wechsler Intelli- 
gence Scale for Children (WISC) to elementary school boys under one of three 
conditions. In the first condition, the WISC was administered in the 
"standard'* manner. In the second condition, students were told by the 
examiner if a response had been "correct" or "mostly correct"; the examiner 
made no response when an answer was incorrect. The third condition utilized 
the award of a poker chip, exchangeable later for money, following each 
correct response. 

Results indicated that there were no differential treatment effects for 
middle-class whites or for lower-class blacks, but that lower-class whites 
performed significantly better when reinforced, either with KR or with poker 
chips, for their correct responses. Sweet and Ringness explained their 
results by concluding, first, that middle-class children already perform at 
a high level under standard administrative conditions and do not profit from 
the additional motivation provided by incentive conditions. Second, the lack 
of a treatment effect for the lower-class black group may have been due to 
the fact that all examiners were white females. Literature reviewed by 
Sattler and Theye (1967) and Weiss and Betz (1973) has shown that performance 
on ^intelligence tests can be affected by the race of the examiner and/or by 
interactions between examiner and examinee race. Furthermore, these same 
effects and interactions have been found for examiner /examinee sex factors > 
and in the Sweet and Rlngness study all students were male and all examiners 
were female, 

A study employing greater standardization of administrative procedures 
was reported by Zontine, Richards, and Strang (1972). In their study, all 
instructions and test items for the Peabody Picture Vocabulary Test (PPVT) 
were presented by tape recorder to a group of 72 seven- to eight-year-old 
children. The role of the examiners was limited to the recording of answers > 
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regulating the speed of the tape recorder, and in the experimental conditions 
controlling the administration of the reinforcer* 

All 72 children were administered Form A of the PPVT by tape recorder 
without reinforcement. Following the administration of Form A, which served 
as control and covariate in the data analysis, the children were assigned 
randomly to one of three conditions for the administration of Form B of the 
PPVT two months later. Examinees in Group 1 received Form B under conditions 
identical to those of the administration of Form A, Students in Group 2 were 
given immediate KR in the form of a white light following each correct 
response; after each five white lights, a red light was turned on to indicate 
to the examinees their cumulative levels of performance. Test administration 
to students in Group 3 was the same as that of Group 2, but in addition these 
students were given a food reward after earning each red light; thus this 
condition added an extrinsic reward to the KR given. Analysis of variance of 
the difference scores between Form A and Form li and analysis of covariance 
of the Form B scores using Form A scores as the covariate showed no significant 
differences in Form B performance as a function of differential treatments. 

Knowledge of Results in Computer-Administered Objective Tests 

With the advent of interactive computer systems has come the capability 
of administering tests by computer. One important potential advantage of 
computer-assisted testing procedures in the area of ability measurement is 
the ease with which examinees can be provided immediate information about 
whether their responses to each test item were correct or incorrect. Bayroff 
(1964) , Ferguson and Hsu (1971) , and Weiss and Betz (1973) have suggested that 
the provision of immediate KR may have positive motivating effects on examinees. 

In spite of the ease with which immediate KR can be provided during the 
administration of an ability test by an interactive computer, only one study 
has investigated the effects of providing KR on a computer-administered 
test. In this study ^ (Betz, 1975), a group of 90 inner-city high school 
students, consisting of 27 black and 53 white students, were administered 
two vocabulary tests by computer. One test consisted of 40 items that were 
generally somewhat too difficult for the average testee. The other test 
administered was a 15 item "pyramidal" (Weiss, 1974, pp. 12-17) adaptive test. 
The manipulated independent variables in this study were: 1) whether or not 
Immediate KR was given and 2) whether the conventional 40-item test or the 
15-item pyramidal test was administered first. The group was classified by 
race into black and ^ite sub-groups. 

The results of a three-way (2x2x2) analysis of variance on the 
conventional test scores showed a main effect only for race; the level of 
performance of whites was significantly higher than that of blacks. None of 
the two-way interactions was significant, but there was a significant three- 
way interaction between race, order, and feedback. Analysis of the sub-group 
means indicated that under KR conditions when the conventional test had been 
administered first, the mean score obtained by the blacks (26.4) was not 



These data were analyzed by Clara DeLeon. 
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significantly different from that obtained by the whites (26.0). Under all 
other conditions of administration the mean scores of the black students was 
significantly different from that of the white students. 

The finding of no performance differential between blacks and whites under 
one set of conditions In which KR was given is certainly an Important one, 
considering the significance of the main effect found for race and the wide- 
spread finding of lower ability test performance levels for blacks (e.g., 
Loehlln, Llndzey & Spuhler, 1975). However, the result was found only In the 
one order condition and thus Is difficult to Interpret. Further analysis, 
however, revealed that the results might be attributed to motivational effects. 
Under KR conditions when the conventional test had been administered first, 
blacks omitted almost no items, while under other conditions they omitted 
more items than whites. The results of this study must be interpreted with 
caution, however, because of the small total sample size and the small number 
of black students. 

Summary 

The limited number of studies available on the effects of KR on test 
performance yield conflicting findings. Studies reported by Beeson (1973), 
Heald (1970), Sweet and Ringness (1971), and Betz (1975) suggest -that on- 
going KR may facilitate test performance, although the latter two studies 
found interactions between the effects of KR and racial/socioeconomic 
variables. Studies by Bierbaum (1965), Spencer and Barker (1969), and Strang 
and Rust (1973) indicated that examinees made more errors under KR conditions^^ 
^ile the study by Zontine et al. (1972) found no differences in performance 
as a function of KR. 

However, the generalizability of these findings is limited. Most testing 
today of ability and intelligence is done using standardized objective tests, 
yet almost all of the evidencd relevant to the use of KR on objective tests 
comes from studies using unstandardized classroom achievement tests. The lack 
of standardisation in such tests and the variety of approaches to their 
construction may explain the conflicting research findings. 

Studies using classroom achievement tests also can be criticized for their 
failure to control the medium of test administration and/or mode of test 
response (Sympson, 1975). Inmost studies reported (e.g., Heald, 1970; 
Pressey, 1950; Spencer & Barker, 1969) • the test using KR has been admlnstered 
using some type of punchboard device, while the test not using KR has been 
administered using a standard (e.g., IBM) answer sheet. It is possible that 
observed performance differences in such cases may be due partly to different 
amounts of time taken to respond to items presented under the two different 
formats, differing amounts of effort or interest on the part of the testees, 
or unfamiliarity with the testing equipment. 

In addition to the lack of generalizability and the failure to control 
the medium for responding to the test, too little attention has been paid to 
how the effects of KR may be moderated by other characteristics of the 
examinees or of the tests being administered. For example, two studies 
(Sweet & Ringness, 1971; Betz, 1975) found that the effects of KR were 
moderated! by race and/or social class variables. Sweet and Ringness hypothesize 
that upper and middle class individuals may be maximally motivated to do their 
best and thus may not need the positive motivating effects of KR. 
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Finally, no study to date has given attention fact that the 

quality of the KR given, that is, the extent to whi^*^ it predo^i-nantly 
positive or negative, may influence its ef fectivene^^ ^ moti^^^^o^^l factor. 
Conventional ability tests are constructed to be m^^^^^ly app^^P^^^te in 
difficulty level to the ability levels of average ^^i^uals ^ group. But 
in these tests, the quality of the KR varies directly Vi^i^ ^^^^ ability level of 
the examinee. That is, high-ability examinees recei"^^ ^Ostly positive (i.e.. 
"correct") KR, average-ability examinees receive sbo^^ h^if positive and half 
negative (i.e., "incorrect") KR, and low-ability e^c^^e^s receive mostly 
negative KR. On a conventional test, therefore, hi^'^btj^^ty e^caminees are 
likely to be encouraged, and thus perhaps motivated > ^^ti they P^^ovided with 
KR; but low-ability examinees may be discouraged an^J ^^st rated, ^^ather than 
motivated, when they are provided with KR. 

On an adaptive ability test, on the other hand> !^^e Sterns are selected to 
be appropriate in difficulty to each individual's at^^-'-tty level rather than 
to the mean ability level of some group of examitxee^* con^^^^tional testing 

procedures. An adaptive test is constructed so that ^ac^ exami*^®® answers 
correctly about half of the items administered; the^^^or^^ ^11 examinees, 
regardless of ability level, should receive about Positive and half 

negative KR. Consequently, the proportion of posit^-^^ adaptive test 

is relatively constant across individuals of differ^*^*^ ^^>tlity levels and 
its effects may be different from those observed ^ ^^ttventiot^^^ test. 

Purposes of the Present Study 

The purpose of the present study was to e^aitti'^^ effects of immediate 
knowledge of results on a computer-administered test of verbal ability. An 
additional focus of the study was to determine ^ett*" o^r the effects of 
KR differed for conventional and adaptive tests, or t^stees different 
ability levels. 

While the major dependent variable of interest l^vel ot Performance on 
the ability test, the effects of KR on two other asp^^ts test-taking 
behavior, response latency and response consistency' ^^®o ^ere studied, in 
addition, the effects of KR versus no-KR conditions Psychometric character- 
istics of the conventional test were examined. The ^^^dy ^j^gQ was concerned 
with the duration of KR effects, in terms of whether ^^c^tving ^ ^ny 
effects on performance on a test given subsequently no-KR conditions. 



METHOD 
Design 

Independent Variables 

This study utilized a randomized block analysis Variance design with 
three independent variables. The blocking variable ^iibject group. Groups 
were high-ability and low-ability college students. ^^Mn each group each 
student was randomly assigned to one of four treatoe^^ resulting 
from the cross-classification of two conditions of ^^^l^fige of^^^^l^a (KR)» 
and two different strategies of measuring ability. condition of the 
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KR factor, examinees were informed after each response whether their responses 
were correct or incorrect. If the response was incorrect, they were informed 
of the correct multiple-choice alternative. In the other condition, 
examinees did not receive KR. 

One strategy of measuring ability was a 5Q-item conventional ability test. 
In this test item difficulties were concentrated at median ability level of 
the high-ability group. The other test was an adaptive ability test, in which 
the items were selected to be appropriate to each individual's ability level. 
The adaptive testing strategy used was the stradaptive test (Weiss, 1973). 

Dependent Variables 

The primary dependent variable of interest was performance level on the 
ability test. Two methods of scoring the conventional test and two methods of 
scoring the stradaptive test were used to obtain estimates of ability. Alter- 
nate scoring methods were used to determine whether the obtained pattern of 
results differed as a function of the methods of scoring the tests. 

Response latency was also a dependent variable of interest. Response 
latency was measured as the elapsed time from the presentation of a test item 
until the testee responded to the item. Response latencies were analyzed to 
determine whether they were affected by the provision of KR. 

^ A third dependent variable was response consistency. In addition to pro- 
viding estimates of ability level, the stradaptive testing strategy yields 
measures of the consistency of an individual's responses to test items (Weiss, 
1973, pp. 26-27; 1974, pp. 52-53). Response consistency In an ability test 
reflects the range of confidence which can be attributed to a given estimate 
of ability level. Indices of response consistency were used to determine 
whether examinees responded in a more consistent manner under KR than under 
no-KR conditions. 

To study the effects of providing KR on the psychometric properties of 
the conventional test, its internal consistency reliabilities within KR and 
no-KR conditions were compared. 

While the subjects studied by Pressey (1950) reported that they liked 
receiving KR, they also indicated feeling frustrated ^en taking tests on 
which they no longer received it. Since frustration or other reactions to 
changed conditions may influence test perfonr^ the design of the present 
study permitted the investigation of whether i receiving KR had any 

effects on performance on a test given immedia ' u r^;; afterwards under standard 
(i.e., No-KR) administrative conditions. 

Test Construction 

Item Pool 

The item pool used to construct the conventional and stradaptive tests of 
verbal ability consisted of five-alternative multiple-choice vocabulary items. 
The items were normed on University of Minnesota students^ most of whom were 
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from the College of Liberal Arts (see McBride & Weiss, 1974). Normal ogive 
difficulty (b) and discrimination (a) parameter estimates were available for 
each Item. The pool contained about 400 vocabulary Items that had a values 
greater than or equal to .30. The difficulty levels of these Items were dis- 
tributed across the continuum of underlying ability, with most values falling 
between ±3 standard units. 

Stradaptlve Test 

Item structure and branching . For construction of the stradaptlve test, 
the Items In the pool were grouped Into nine levels, or strata, on the basis of 
their difficulties. (See Appendix Table A-1 for the difficulties and discrimi- 
nations of all Items In the stradaptlve test.) Each stratum Included Items 
whose range of difficulty (I.e., the difference In difficulty values between 
the most and the least difficult Items In the stratum) was .67. There was no 
overlap In Item difficulties between adjacent strata. Items ranged In diffi- 
culty from &=-3 to Z?=H-3. 

Once Items had been grouped Into difficulty levels, they were selected 
for Inclusion In the test on the basis of their discriminating power. For any 
one stratum, the most highly discriminating Item was selected first, and each 
successive Item chosen had a lower discrimination. In this way, 30 Items 
were selected for each stratum for which there were sufficient Items available 
In the pool. However, no Item having a discrimination less than a».30 (which 
corresponds approximately to a blserlal Item-total score correlation of .28) 
was considered acceptable; as a result the strata at the extreme levels of 
difficulty did not contain 30 Items. The smallest number of Items In & ntvm 
•was 17. A total of 243 Items comprised the stradaptlve Item structure. 

Entry Into the stradaptlve test was determined on the basis of the exam- 
inee's self -reported grade-point average. Appendix Table A-2 Indicates the 
entry stratum corresponding to each of nine GPA intervals. Those examinees 
reporting high CPAs began the test with more difficult itemis than did those 
reporting lower CPAs. Examinees were branched through the stradaptlve item 
structure according to the rule that following a correct response, the most 
discriminating item remaining in the next more difficult stratum was adminis- 
tered, and following an incorrect response, the most discriminating item in the 
next less difficult stratum was administered. 

Testing was terminated when either a ceiling stratum had been identified 
or 75 items had been administered. Since the items used were five-alternative 
multiple-choice items, the ceiling stratum was defined as that stratum where 
the examinee answered 20% or fewer of the items correctly, based on a minimum 
of five items administered at that stratum. However, there were some examinees 
whose response patterns never permitted the Identif ieation of a celling stratum. 
This could happen only for very high-ability examinees capable of responding at 
better than chance level at even the most difficult stratum. > ceiling 
stratum had not been identified after the administration of 75 items, testing 
was terminated. 

Scoring . Both ability level scores and consistency scores were calculated 
for stradaptlve test response protocols. 
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In the stradaptive test, examinees answer different numbers of items, and 
the items that they answer vary in difficulty according to the xndividual's 
ability level. Thus, simple number-correct scores are not appropriate as 
ability estimates. However, maximum likelihood scopes (Birnbaum, 1968) are 
appropriate because they take into account the difficulty and discrimination of 
each item administered and because they do not depend on the number of items 
administered to an individual. Accordingly, maximum likelihood scores were 
calculated for the stradaptive test. 

The likelihood equation for the 3-parameter logistic model given by 
Birnbaum (1968, p. 459) was solved for the maximum likelihood estimate of each 
examinee's ability. Difficulty and discrimination parameters used for each 
item administered are those given in Apperdix Table A-1. The guessing param- 
eter (c) was set at .20 since each item had five response alternatives. Input 
into the scoring program consisted of each examinee's vector of I's and O's, 
corresponding to correct and incorrect responses respectively, along with the 
corresponding item parameters. 

Ten simpler methods of scoring the stradaptive test were proposed by Weiss 
(1973, pp. 20-26). However, results reported by Vale & Weiss (1975a, b) indi- 
cated that the average difficulty of all items answered correctly (Score 8) was 
the best of the ten originally proposed methods of scoring the stradaptive test. 
This score requires fewer assumptions than the maximum likelihood score and con- 
siderably less computational time. Consequently, it was used as a dependent 
variable in this study to determine whether its results were the same as those 
obtained from maximum likelihood scoring. 

Weiss (1973) suggested that the consistency of a response pattern might be 
related to the confidence with which ability is measured by a given set of test 
items. Consistency of response for an individual is to some extent analogous 
to discrimination indices characterizing items;. An item discrimination index 
reflects the extent to which people having higifx levels of the trait of interest 
respond correctly to an item more often than do peopl^i having lower levels of 
that trait. Similarly, individuals should respond correctly to easier items 
more often than they respond correctly to more difficult items. If individuals 
answer many easy items (i.e., items below their ability levels) incorrectly and 
many difficult items (i.e., items above their ability levels) correctly, they 
are responding inconsistently, and it may be inferred that something besides 
the trait of interest is influencing their responses. In general, consistent 
testees are those whose response records contain less variability in the diffi- 
culties of items they encounter aud answer correctly. More consistent testees 
will also answer items drawn from a smaller number of strata. 

Weiss (1974, pp. 52-53) suggested five different consistency scores for 
use with the stradaptive test. Research by Vale and Weiss (1975a, b) and anal- 
yses of the present data (Betz, 1976) indicated that there are two clusters in 
these consistency scores; consequently, one score was selected as representa- 
tive of each cluster. Consistency Score 1 (Score 11 in Vale & Weiss, 1975a, b) 
is defined as the standard deviation of the difficulties of all items encoun- 
tered by a testee. Consistency Score 2 (Score 15 in Vale & Weiss) is the 
number of strata between the basal and ceiling strata. This score corrects 
for inappropriate entry points, or entry strata which are below the basal stra- 
tum or above the ceiling stratum. 
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In summary, maximum likelihood ability level scores, an average diffi- 
culty ability score, and two consistency scores were selected for analysis of 
performance and test-taking behavior on the stradaptive test. The maximum 
likelihood score was comparable to that used for the conventional test and thus 
permitted direct inter-strategy comparisons* The remaining scores were unique 
to the stradaptive test and therefore were analyzed only within that testing 
strategy. 

Peaked Conventional Test 

The peaked conventional test consisted of 50 items with difficulty values 
concentrated around &=-.20 and discrimination values greater than or equal to 
a=.40. The characteristics of the 50 items constituting the test are summa- 
rized in Table 1. While the mean difficulty value was 2?--. 20, the easiest item 
had &=-.97, and the most difficult item had &=.56. The average item discrimi- 
nation (a-.89)-was considerably higher than the minimally acceptable level 
(a-. 40), but there was considerable variation among items. Appendix Table A-3 
provides the normal ogive difficulty and discrimination parameter values char- 
acterizing each item in the test. Items were administered in the order shown 
in Appendix Table A-3. 



Table 1 

Summary of the Characteristics 
of Items in the 50-Item Conventional Test 



Item Difficulty (b) Item Discrimination (a) 

Mean S.D. Minimum Maximum Mean S.D. Minimum Maximum 



-.20 .38 -.97 .56 .89 .34 .41 1.90 



The conventional test was scored using simple number^correct scores and 
maximum likelihood scores based on Bimbaum*s (1968) three-parameter logistic 
model. 

Post-Test 

To determine whether there were any carry-over effects on later test per- 
formance for students who had received ICR. on the initial test, a 44-ltem post- 
test was administered to all testees following the administration of the 
experimental (i.e., peaked conventional or stradaptive) test. This test was 
constructed by selecting items from a pool of 120 vocabulary items from the 
Cooperative School and College Ability Tests, ^ forms 2A, 2B, 3A, and 3B. The 
items, like those in the item pool used for the stradaptive and peaked conven- 
tional test, were five-alternative multiple-choice vocabulary items* They 
were normed in a population of high school students, and normal ogive difficulty 
and discrimination parameters were available for each item. The test was con- 



^These items were made available for research use by Educational Testing 
Service. 
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structed to have a rectangular distribution of item difficulties; that is, item 
difficulties were spaced approximately evenly across the ability/dif ficulty 
continuum and thus included very easy to very difficult items. 

Table 2 shows the mean and standard deviation of the normal ogive diffi- 
culty and discrimination values characterizing the 44 items in the test. While 
the mean difficulty of these items (5=-. 19) was almost ident^.cal to that of 
the items in the peaked conventional test (5=-. 20), the normative populations 
from which the item parameters were derived differed substantially (i.e., high 
school students for the post-test parameters and college students for the ex- 
perimental test parameters). It was expected, therefore, that the post-test 
items would be easier for college students (the population of interest in the 
present study) than would be items from the peaked test haying numerically 
comparable item difficulty values. 



Table 2 

Summary of the Characteristics 
of the Items in the 44-Item Post-Test 



Item Difficulty 


(b) 


Item Discriminatic 


.a) 


Mean S.D. Minimum 


Maximum 


Mean S.D. Minimum 


Maximum 


-.19 1.37 -2.85 


2.62 


1.22 .40 .51 


1.94 



Appendix Table A-4 provides normal ogive difficulty and discrimination 
parameters for each of the 44 items in the post-test. Items were adminis- 
tered in the order indicated in Table A-4. Number-correct scores were deter- 
mined for each testee. 

Procedure 

Subjects 

Two groups of students participated in this^study. The first group con- 
sisted of 239 students taking the introductory psychology course in the College 
of Liberal Arts (CLA) at the University of Minnesota. The second group consisted 
of 111 students from psychology courses in the University's General College (GC) . 
Both received two points toward their final course grade for participation in 
the experiment. The CLA students were considered a High-Abitity group, i.e., 
a group consisting of people ^o typically perform relatively well on ability 
and scholastic aptitude tests. General College has lower admission standards 
than does CLA. Thus the GC students comprised the LotO'' Ability group, based on 
their lower mean ability level on standard tests of ability and scholastic apti- 
tude. 

Test Administration 

All students were tested at individual cathode-ray terminals (CRTs) con- 
nected to a Hewlett-Packard 9600E Real-Time computer system. Test items were 
presented at 960 characters per second on the CRT screen, and testees indicated 
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their responses by typing in the number corresponding to the chosen alternative 
for each five-alternative multiple-choice item. Instructional screens explain- 
ing the operation of the CRTs were provided prior to testing (see DeWitt & 
Weiss, 1974, pp. 36-53), and a proctor was present in the testing room to pro- 
vide assistance to any testee having difficulty with the equipment or instruc- 
tions. Students were permitted as much time as necessary to complete the 
tests and were so informed before testing was begun. 

Experimental treatment . Immediate knowledge of results was provided to 
one half of the examinees. After the examinee responded to the test item, a 
message appeared on the screen below the item just answered. A correct response 
to the item was followed by the message, ''That's correct". An incorrect 
response was followed by the message, '^That's not correct. The correct answer 
is a;," where x was the number corresponding to the correct multiple-choice 
alternative. In both cases, the testees then were allowed to examine the item 
and were to press the "return" key when they were ready for the administration 
of the next item. In the groups that did not receive KR, a new item was pre- 
sented immediately following the examinees' responses to the previous item. 

Testing sequence . After examinees had completed the instructional 
screens and had answered several identification and demographic questions, 
test administration was begun. First, either the 50-item peaked conventional 
test or the stradaptive test was administered with or without KR. Second, 
testees were administered several items concerning their reactions to the 
testing situation and, in the KR group only, their reactions to the provision 
of immediate knowledge of results (analyses of these data are reported by 
Betz & Weiss, 1976). Following completion of the reaction items, all 
examinees were administered the 44-item post-test. 

Data Analysis 

Several types of data were available for all individuals participating 
in the study, while other data were available only for testees completing 
either the stradaptive or the conventional experimental test. Data available 
for all testees included: 1) maximum likelihood ability estimates (scores) 
for the experimental test; 2) post-test number-correct scores; and 3) response 
latency data for each item administered L . Data available for subgroups of 
testees included: 1) number-correct scores for examinees completing' the 
peaked conventional experimental test; and 2) the average difficulty score 
and two consistency scores for examinees completing the stradaptive experi- 
mental tests. 

Analysis of Ability Estimates 

Mean differences . Maximum likelihood ability estimates obtained from the 
conventional and stradaptive tests were analyzed using a three-vay analysis of 
variance. The three factors — KR, testing strategy, and ability group — were 
completely crossed and each had two levels. Because cell frequencies in the 
three-way crossed classification were neither exactly equal nor proportional, 
it was necessary to use computational procedures in the analysis of variance to 
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account for the lack of orthogonality among main effects and between main 
and interaction effects. Computations were based on the "classic experi- 
mental" approach described by Nie, Hull, Jenkins, Steinbrenner, and Bent 
(1975, pp. 405-408) • 

Within each of the eight groups resulting from the 2 x 2 x 2 design, the 
mean and standard deviation of scores were calculated. Since each experi- 
mental variable consisted of two levels, significant main effects indicated a 
significant difference between the two means involved. To determine which 
combinations of testing conditions resulted in significantly high or low ^ 
test performance, comparisons of the subgroup means were made using Scheff&'s 
(1959) method. 

Two-way analyses of variance and post-hoc comparisons also were used for 
the analysis of ability estimates obtainable from only one of the two experi- 
mental tests. For the peaked conventional test the number-correct score was 
analyzed, and for the stradaptive test the average difficulty score was 
analyzed. 

Internal consistency reliability . The internal consistency reliability 
of the peaked conventional test was calculated using Cronbach's (1951) alpha 
formula for the total group of examinees taking the test and separately with- 
in the KR and No-KR subgroups. The significance of the difference between 
the reliability coefficients under KR and No-KR conditions was calculated 
using the formula suggested by Glass and Stanley (1970, p. 311). The formula 
is based on Fisher's Z transformation of r and was applied to the alpha reli- 
ability values, r, and r^, after conversion to Z and Z • 
1 z •^1-^2 

Other Response Characteristics 

Response consistency . Two-way analyses of variance of each of the two 
consistency scores derived from the stradaptive test records were completed 
using ability group and KR as the independent variables. Mean scores within 
each treatment combination were calculated, and a ipoBtevvovi contrasts were 
studied. 

Response latencies * The response latency for each item administered to 
each individual was available for both the stradaptive and conventional tests. 
Response latencies were recorded from the time the display of a test item was 
begun until the examinee pressed the "return" key to record his answer to the 
item. Latencies, in seconds, were accurate to 1/10 second. The mean response 
latency over all items administered was calculated for each testee> thus yield- 
ing a latency "score" for each individual. Latency scores were analyzed using 
a three-way analysis of variance. 

Carry-over Effects 

Post-test scores . Within the KR and No-KR treatment groups > the number- 
correct scores on the post-test were analyzed using a three-way analysis of 
variance, with KR, testing strategy and ability group as the independent vari 
ables. Means and standard deviations of scores were calculated within each 
treatment-subject group combination, and contrasts on the means were made. 
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Correlation of experimental and post-test scores . To determine whether 
KR affected the relative positions of Individuals within a group, correlations 
between post-test number-correct scores and the experimental test maximum like- 
lihood ability estimate were calculated. These correlations were calculated 
separately for groups completing conventional and stradaptlve tests for both KR 
and No-KR conditions. To determine whether there were greater differences be- 
tween experimental test scores and post-test scores for the KR and No-KR condi- 
tions, the differences among the four correlation coefficients were tested for 
statistical significance using the procedure suggested by Glass and Stanley 
(1970, pp. 311-313). 



RESULTS 

./ - 

Analysis of Ability Estimates 

Maximum Likelihood Scores 

Table 3 shows the results of the three-way analysis of variance of the 
maximum likelihood ability estimates obtained from the conventional and stradap- 
tlve tests. Table 3 also Indicates the numbers of examinees and the means and 
standard deviations of scores associated with each treatment combination and 
for combined treatments. As the table Indicates, there were significant main 
effects for Ability Group and for KR, but there were no significant Interaction 
effects. Only the Interaction between Ability Group and test approached statis- 
tical significance (p=.122). 

As shown in Table 3, the overall mean level of performance of the High- 
Ability group (-.26) was significantly (p<.01) higher than that of the Low- 
Ability group (-.87), supporting their a priori ability level designations. 
Table 3 also shows that the performance level of both groups was significantly 
(p<.05) higher under KR conditions than under No-KR conditions. 

Figure 1 shows a plot of the means for the eight experimental groups. Con- 
trasts on the means for the eight subgroups indicated that there were three 
subgroups of means which were not significantly different within subgroups. The 
dashed lines in Figure 1 differentiate those three subgroups. 

As Figure 1 shows, in both subject groups performance on the conventional 
test was significantly better under KR conditions; the Hlgh-Ablllty-KR mean 
(-.06) was significantly greater than the High-Ab ill ty-No-KR mean (-.43), and 
the Low-Ability-KR mean (-.87) was significantly higher than the Low-Ablllty- 
No-KR mean (-1.20). On the stradaptlve test, however, the level of performance 
of the High-Ability group under KR conditions (-.19) was significantly greater 
than that under No-KR conditions (-.39), but the differences for the Low-Ability 
group (-.69 and -.72) and for the combined groups (i.e., High-Ability and Low- 
Ability) were not statistically significant. 

Figure 1 also shows that significant differences between ability level 
groups were not found under all testing conditions. Although the overall level 
of performance in the High-Abilltyj group was significantly higher than that of 
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the Low-Ability group (-.26 vs. -.87), the performance of the Low-Ability 
group on the stradaptive and conventional tests under KR conditions was not 



Table 3 

Means and Standard Deviations of Maximum Likelihood 
Ability Estimates for Conventional and Stradaptive Tests in 
High- and Low-Ability Groups With and Without KR, 
and Three-Way ANOVA Results 



Experimental Condition 



KR 



No-KR 



Combined 
Conditions 



Test and Group 


N 


Mean 


S.D. 


N 


Mean 




717 


Mean 


D .Lr . 


Conventional Test 


















1 1 A 

X • X *T 


High-Ability 


60 


-.06 


1.04 


57 


- .43 


1 oo 


117 
XX/ 


— 9A 


Low- Ability 


28 


-.87 


.84 


28 


-1.20 


1 An 


56 


-1.03 


1. 16 


Stradaptive Test 
















- .29 


1 07 
X ■ U / 


Hlgh-Ablllty 


60 


-.19 


1.21 


62 


- .39 


O 1 


122 


Low- Ability 


27 


-.69 


.79 


27 


- .72 


M 
. oy 


55 


- .71 


.83 


Combined Groups 
















- .49 


1 . 20 


Conventional Test 


88 


-.131 


1.05 


85 


- .68 




173 


Stradaptive Test 


87 


-.35 


1.12 


89 


- .49 


Q1 

. 7 X 


176 


- .42 


1.02 


Hlgh-Ablllty 


120 


-.12 


1.13 


119 


- .41 


1 07 
X • U / 


239 


- .26 


1.10 


Low-Ablllty 


55 


-.78 


.82 


55 


- .97 


1 


110 


- .87 


1.02 


Total Grou£ 


176 


-.33 


1.09 


174 


- .58 


1 1 /. 


349 


- .46 


1 11 

X ■ X X 




Three-Way Analysis 


of Variance 
















Sum of 






Mean 








Source of Variation 






Squares 




DF 


Square 




F 




Main Effects 






33.84 




3 


11.28 




9.79 


.001 


Ability Group 






27.67 






27.66 


23.99 


.001 


Test 






.42 






.42 




.36 


.999 


KR 






5.63 






5.63 




4.88 


.026 


Two-Way Interactions 






3.90 






1.30 




1.13 


.340 


Ability Group x Test 




2.71 






2.71 




2.35 


.122 


Ability Group x KR 




.17 






.17 




.15 


.999 


Test X KR 






1.02 






1.02 




.89 


.999 


Three-Way Interaction 


















.999 


Ability Group x Test x 


KR 


.07 






.07 




.06 


Residual 






393.15 




341 


1.15 








Total 






430.96 




348 


1.24 









^Estimated probability of error in rejecting null hypotheses. 



significantly lower than that of the High-Ability group under No-KR conditions 
on both tests. It may be noted further that the performance of the High-Ability 
group was highest under KR conditions, while the performance level of the Low- 
Ability group was high under KR conditions and on the stradaptive test in general. 
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From these. results it appears that the performance of the High-Ability 
group was enhanced when KR was given regardless of testing strategy. On the 
other hand, performance of the Low-rAbility group was improved under either KR 



Figure 1 

Mean Maximum Likelihood Ability Estimates 
as a Function of Testing Strategy, KR, and Ability Group 
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conditions or by administration of an adaptive test. When no KR was provided on 
the conventional test — the conditions typical of most standard testing proce- 
dures — the performance level of Low-Ability individuals was significantly lower 
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than that of the High-Ability Individuals. But this performance differential 
between the two groups did not appear, under No-KR conditions, using an 
adaptive test. 

Other Ability Scores 

Conventional test . The means for the number-correct scores within each 
KR-subject group combination and the results of the two-way analysis of 
variance of the means are shown in Table 4. These results show significant 
main effects for both Ability Group and KR, but no interaction effects. Again, 
the performance of the High-Ability group was significantly better than that of 
the Low-Ability group, and the overall mean score under KR conditions was 
higher than that under No-KR conditions. However, contrasts on the subgroup 



Table 4 

Means and Standard Deviations of Number Correct Scores 
on the 50-Item Conventional Test for Two Ability Level Groups 
With and Without KR, and Two-Way ANOVA Results 



Group 




Experimental Condition Combined 
KR No-KR Conditions 


N 


Mean S.D. N Mean S.D. N 


Mean 


S.D. 


High-Ability 
Low- Ability 
Total 


60 
28 
88 


30.47 9.20 57 27 
22.54 8.28 28 20 
27.94 9.61 85 25 


.10 10.31 117 
.71 9.39 56 
.00 10.41 173 


28.83 
21.62 
26.50 


9.90 
8.90 
10.09 






Two-Way Analysis of ' 


Vatiance 






Source of 
Variation 




Sum of 
Squares 


Mean 
DF Square 


F 




Main Effects 




2319.82 


2 1159.91 


12.92 


.001 


Ability Group 


1945.29 


1 1945.29 


21.67 


.001 


KR 




354.28 


1 354.28 


3.95 


.046 


Ability Group 


X KR 


22.45 


1 22.45 


.25 


.999 


Error 




15170.98 


169 89.77 






Total 




17513.25 


172 101.82 







Estimated probability of error in rejecting null hypotheses. 



means indicated that while the High-Ability-KR mean (30.47) was significantly 
greater than the Low-Ability-KR (22.54) mean or the Low-Ability-No-KR (20.71) 
mean, the Hi?h-Ability-No-KR mean (27.10) was not significantly different from 
the Low-Abilicy-KR mean (22.54). 

Stradaptive test . Mean average difficulty scores as a function of KR and 
subject group and the results of the two-way analysis of variance are in Table 
5. Only the Ability Group effect was significant in these data. As expected, 
the scores of the High-Ability group were higher than those of the Low-Ability 
group. Contrasts among the subgroup means indicated that there were no further 
significant mean differences. This finding of no KR or interaction effects is 

25, 
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In agreement with the results shovm for the stradaptlve maximum likelihood 
scores within the Low-Ability group; but it does not agree with the finding 
of significant KR effects for the High-Ability group using maximum likelihood 
scores . 



Table 5 

Means and Standard Deviations of Average-Dif f iculty- 
of-Items-Answered-Correctly Scores on the Stradaptive Test 
for Two Ability Level Groups With and Without KR, 
and Two-Way ANOVA Results 



Experimental Condition Comb ined 

KR No-KR Conditions 

Group N Mean S.D. N Mean S.D. N Mean S.D. 



High-Ability 


54 -.21 1.00 


59 


-.38 


.89 


113 -.30 


.95 


Low-Ability 


28 -.67 .82 


26 


-.61 


.92 


54 -.64 


.86 


Total 


82 -.37 .96 


85 


-.45 


.90 


167 -.41 


.94 


Two-Way Analysis of Variance 


Source of 


Sum of 












Variation 


Squares 




DF 


Square 


F 




Main Effects 


5.28 




2 


2.67 


3.05 


.05 


Ability Group 


4.20 




1 


4.20 


4.86 


.03 


KR 


1.38 




1 


1.38 


1.59 


.21 


Ability Group x KR 


.81 




1 


.81 


.94 


.99 


Error 


121.94 




141 


.86 






Total 


128.02 




144 









Estimated probability of error in rejecting null hypotheses. 



Internal Consistency Reliability 

Coefficient alpha for the 50-item conventional test was .90 when calcu- 
lated for the total group of examinees taking the test. The reliability of the 
test under KR conditions was .89, while that under No-KR conditions was .91. 
The difference between the reliability coefficients for KR and No-KR conditions 
was not statistically significant. 

Other Response Characteristics 

Response Consistency 

Tables 6 and 7 show means for the two stradaptive consistency scores, by 
Ability Group and KR conditions, and the results obtained from the two^ay 
analyses of variance for each score. There were no significant main or inter- 
action effects for either of the scores, nor were there significant differences 
among any of the cell means. Thus, response consistency was not significantly 
influenced by either KR conditions or by ability level of the testees. 



26 



-21- 



-A 



Table 6 

Means and Standard Deviations of Stradaptlve Consistency Score 1 
as a Function of KR Condition and Ability Level Group, 
and Results of the Two-Way ANOVA 







Experimental 


Condition 


Conditions 








KR 






No-KR 




Combined 




Group 


N 


Mean 


S.D. 


N 


Mean S.D. 


N 


Mean S 


.D. 


i 

Bigh-Ability 

Low-Ability 

Total 


54 
28 
82 


.78 
.80 
.78 


.18 
.22 
.19 


59 
26 
85 


.74 .17 
.85 .27 
.77 .19 


113 
54 
167 


.76 
.82 
.78 


.18 
.26 
.19 






Two-Way Analysis of 


Variance 








Source of 
Variation 








Sum of Mean 
Squares DF Square 


F 





Main Effects 

Ability Group 
KR 

Ability Group x KR 

Residual 

Total 



.07 
.04 
.02 
.06 
4.31 
4.41 



2 
1 
1 
1 

141 
144 



.04 
.04 
.02 
.06 
.03 
.03 



1.14 
1.30 
.83 
1.99 



.32 
.25 
.99 
.16 



^Estimated' probability of error in rejecting null hypotheses. 



Table 7 

Means and Standard Deviations of Stradaptive Consistency Score 2 
as a Function of KR Condition and Ability Level Group, 
and Results of the Two-Way ANOVA 



Group 




Experimental Condition 
KR No-KR 




Conditions 
Combined 


N 


Mean S.D. N 


Mean 


S.D. 


N 


Mean 


S.D. 


High-Ability 

Low-Ability 

Total 


54 
28 
82 


1.59 1.25 59 
1.64 1.22 26 
1.61 1.23 85 


1.83 
1.23 
1.65 


1.15 
1.11 
1.16 


113 
54 
167 


1.72 
1.44 
1.63 


1.20 
1.19 
1.19 


Two-Way Analysis of Variance 


Source of 
Variation 




Sum of 
Squares 


DF 


Mean 
Square 


F 


P 


Main Effects 




1.36 


2 


.68 




.58 


.99 


Ability Group 




1.35 


1 


1.35 




1.14 


.29 


KR 




.05 


1 


.04 




.03 


.99 


Ability Group x KR 




2.11 


1 


2.11 




1.79 


.18 


Residual 




166.29 


141 


1.18 








Total 




169.77 


144 


1.18 
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Response Latency 



Means and standard deviations of response latency scores as a function of 
KR, Test, and Ability Group, and the results of the three-way analysis of 
variance of mean latency scores, are shwn in Table 8. Table 8 indicates that 
the only significant main effect was for Ability Group. High-Ability examinees 
took significantly less time to respond to test items than did Low-Ability 
examinees; the mean response time for the former group was 14.9 seconds while 
that of the latter group was 16.7 seconds. Response latency did not differ 
significantly as a function of Test or KR, and there were no significant inter- 
action effects. 



Table 8 

Means and Standard Deviations for Average Intra-Individual 
Response Latency in Seconds, and Three-Way ANOVA Results 



Experimental Condition 



Combined 



Test and Group 


KR 




No- 


-KR 


Conditions 


Mean 


S.D. 


Mean 


S.D. 


Mean 


S.D. 


Conventional Test 














High-Ability 


14.4 


4.5 


14.7 


5.0 


14.6 


4.8 


Low-Ability 


15.2 


4.4 


17.4 


6.9 


16.3 


5.8 


Stradaptive Test 














High-Ability 


15.2 


5.3 


15.2 


5.1 


15.2 


5.2 


Low-Ability 


18.0 


9.1 


16.1 


5.5 


17.1 


7.6 


Combined Groups 














Conventional Test 


14.7 


4.5 


15.6 


5.8 


15.1 


5.2 


Stradaptive Test 


16.1 


6.8 


15.4 


5.2 


15.8 


6.1 


High-Ability 


14.8 


4.9 


14.9 


5.1 


14.9 


5.0 


Low-Ability 


16.6 


7.2 


16.7 


6.2 


16.7 


6.7 


TotjJl 


15.4 


5.8 


15.5 


5.5 


15.4 


5.6 



Three-Way Analysis of Variance 



Source of 
Variation 



DF 



Mean 
Square 



KR 
Test 

Ability Group 
KR X Test 

KR X Ability Group 
Test X Ability Group 
KR X Test X Ability Group 
Residual 



342 



1.18 
40.72 
246.82 
47.56 
.03 
.35 
69.16 
31.37 



.04 
1.30 
7.87 
1.52 
.001 
.01 
2.20 



.999 
.254 
.006 
.217 
.999 
.999 
.135 



^Estimated probability of error in rejecting null hypotheses. 



Carry-Over Effects 



Post-Test Scores 

Means and standard deviations of post-test number-correct scores as a 
function of Ability Group and KR conditions on the experimental test are 
shown in Table 9; the table also shows the results of the three-way analysis 
of variance of mean post-test scores. As shown in Table 9, there was a 
significant main effect for Ability Group; the mean number correct in the 
High-Ability group was 35.6, or about 81% correct, while that in the Low- 
Ability group was 32.5, or about 74% correct. There were no other signif- 
icant main or interaction effects, indicating that performance on the post- 
test was not affected by differences in the conditions under which the 
experimental test was administered. Thus, while testing conditions did 
influence test performance while they were in effect, there were no discern- 
ible carry-over effects on test performance on a conventional test adminis- 
tered immediately after the experimental test. 



Table 9 

Means and Standard Deviations of Number-Correct Scores 
on the Post-Test as a Function of Experimental Conditions, 



and 


Three-Way ANOVA 


Results 










Experimental Condition 


Combined 




KR 






No-KR 




Conditions 


Test and Group 


Mean 


S 


.D. 


Mean S 


.D. 


Mean 


S.D. 


Conventional Test 












35.86 


4.79 


High-Ability 


36.23 


5 


.05 


35.47 4 


.52 


Low-Ability 


32.93 


6 


.15 


32.21 6 


.08 


32.57 


6.07 


Stradaptive Test 










.53 


35.34 


5.34 


High-Ability 


35.38 


5 


.19 


35.31 5 


Low-Ability 


32.86 


6 


.95 


32.07 6 


.37 


32.47 


6.62 


Combined Groups 












34.80 


5.45 


Conventional Test 


35.18 


5 


.60 


34.40 5 


.28 


Stradaptive Test 


34.58 


5 


.89 


34.33 5 


.95 


34.45 


3.90 


High-Ability 


35.81 


5 


.11 


35.39 5 


.05 


35.60 


5.07 


Low-Ability 


32.89 


6 


.50 


32.14 6 


.17 


32.52 


6.32 


Total 


34.88 


5 


.74 


34.36 5 


.62 


34.62 


5.68 


Three 


-Way Analys 


is of 


Variance 
















Mean 








Source of Variation 






DF 


Square 




F 




Ability Group 








720.18 




23.51 


.001 


Test 








12.61 




.41 


.999 


KR 








23.72 




.77 


.999 


Ability Group x Test 








3.12 




.10 


.999 


Ability Group x KR 








2.07 




.07 


.999 


Test X KR 








4.32 




.14 


.999 


Ability Group x Test x 


KR 






2.68 




.09 


.999 


Residual 






342 


30.64 
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Correlatlon of Experimental and Post-Test Scores 

For the conventional test group, the correlation between experimental-test 
maximum likelihood scores and post-test scores was higher, although not signif- 
icantly, under No-KR conditions (r>=.76) than under KR conditions (r=.69) . On 
the stradaptlve test, the correlation was again higher under No-KR conditions 
(r=.79) than under KR conditions (r=,76), but this difference also was not - 
statistically significant. Thus, providing KR on a verbal ability test does 
not result in test scores which correlate substantially differently with scores 
on another test administered without KR than do scores obtained from individuals 
taking the same test under typical. I.e., No-KR, conditions of test adminis- 
tration. 



SUMMARY AND CONCLUSIONS 
Effects of KR on Test Performance 



The results of the present study iiidicate that knowledge of results led to 
significant increases in test scores for the total group of examinees; that is, 
mean test scores were significantly higher under KR conditions than under No-KR 
conditions. However, the magnitude of the effects of KR on performance differed 
according to whether the test administered was a conventional or a stradaptlve 
test. 

The improvement under KR conditions was substantially greater for the con- 
ventional test than it was for the stradaptlve test. Both the maximum likeli- 
hood and the number-correct scores on the conventional test were significantly 
higher under KR conditions than under No-KR conditions; this effect was signif- 
icant for the total group of examinees and also within both the High-Ability 
and Low-Ability subgroups. While the KR score means were higher than the No-KR 
means for the stradaptlve test scores, these differences were not significant 
for either the total group of examinees or for the Low-Ability group. Only the 
stradaptlve test maximum likelihood scores in the High-Ability group were 
significantly higher under KR conditions. 

Thus, providing KR on a conventional test of ability led to significant 
Increases in mean test scores for both high- and low-ability testees. Providing 
KR on an adaptive test of ability led to increases in test scores for both 
ability-level groups, but the score increase was statistically significant only 
within the high-ability group. 

These results Indicate that KR alone can enhance ability test performance 
regardless of the ability level of the examinee, but only under conventional 
testing procedures. In contrast to the hypothesis of Sweet and-Ringness (1971), 
the present study indicated that high-ability examinees achieved significantly 
higher scores under KR conditions even though they may be generally highly moti- 
vated to do well. Low-ability examinees were found to achieve significantly 
higher scores on the conventional test under KR conditions even though they 
generally receiyed lower proportions of positl^ 3 KR than did the high-ability 
examinees. 
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Group Differences and Testing Conditions 

On both the conventional test and the stradaptlve test, the Hlgh-Ablllty 
group obtained significantly higher mean scores than did the Low-Ablllty group. 
Thus, over all testing conditions combined, the performance differential be- 
tween the two groups corresponded to that expected on the basis of their 
previous levels of performance on ability tests- However, and more Importantly, 
results Indicated that there were differences between the two ability-level 
groups In the effects of testing conditions on performance, and that under some 
conditions of test administration, the performance levels of the two groups 
were not significantly different*,^ 

The Hlgh-Ablllty group performed consistently and significantly better 
under KR conditions than under No-KR conditions on both the conventional and 
stradaptlve tests, and there were no significant differences In this group be- 
tween mean scores on the conventional and stradaptlve tests. In contrast, while 
the Low-Ability students performed better on the conventional test under KR con- 
ditions, their performance on the stradaptlve test did not differ as a function 
of KR conditions. Moreover, the performance of this group on the stradaptlve 
test was consistently better than their performance on the conventional test 
even when the latter test had been administered under KR conditions. The score 
means for the Low-Ability group on the stradaptlve test under both KR and No-KR 
conditions, and on the conventional test under KR conditions, were not signif- 
icantly different from each other; but all three means were significantly higher 
than the group's mean on the conventional test under No-KR conditions. Further, 
the former three means in the Low-Ability group did not differ significantly 
from the means of the High-Ability group on either the conventional or stradap- 
tlve tests administered under No-KR conditions. 

Thus it appears that the performance of low-ability examinees was enhanced 
eithev by providing these students with immediate knowledge of results ov by 
administering to them an adaptive test of ability. These results imply that for 
low-ability students, adaptive testing might provide the same incentive effects, 
as does the provision of KR. 

Motivating Effects of Adaptive Testing 

The incentive effects of an adaptive test for low-ability individuals may 
be because they perceive themselves as doing relatively well on an adaptive 
test in comparison to their usual performance on ability tests. Most group- 
administered ability and aptitude tests are constructed to be appropriate for 
individuals of average ability in the group for which the test is Inteiided. 
Low-ability examinees probably perceive such tests as beyond their capabilities 
and may become discouraged early in the test. When these examinees have taken 
several tests that are too difficult for them, they may approach later testing 
situations with an expectation of further discouragement and failure. 

However, on an adaptive test the items administered to low-ability examinees 
will be easier than those administered to average- or high-ability examinees. 
The stradaptlve test is designed so that testees of all ability levels should be 
able to answer about half of the items administered to them correctly; and 



Indeed, results Indicated that the average examinee in the Low-Ability group 
obtained 46,5% correct, compared to about 40% correct on the conventional test 
with no KR. Consequently, low-ability examinees taking the stradaptive test 
probably perceived that, in relationship to their expectations, they were per- 
forming well on the test, It-^is possible that this situation served as an in- 
centive for these individuals to try harder on the stradaptive test. 

The absence of a motivating effect for the adaptive test in the High-Ability 
group may be explained by the same reasoning that explains a motivating effect 
in the Low-Ability group. The conventional test used in the present study was 
constructed to be maximally appropriate to individuals of about average ability 
in the normative population of high-ability students. On the basis of the mean 
difficulty level of the conventional test items, the average high-ability exam- 
inee was expected to answer about 54% of the items correctly. Results indicated 
that the average high-ability examinee obtained about 58% correct on the con- 
ventional test. Similarly, on the stradaptive test most examinees, regardless 
of ability level, should answer about 50% of the items administered to them 
correctly. In fact, the average High-Ability examinee obtained 50% correct on 
the stradaptive test. 

Most high-ability students probably were accustomed to taking tests de- 
signed to be appropriately difficult for average individuals in their group. 
Therefore, the stradaptive test likely was perceived as an experience corres- 
pondent with their usual expectations of their level of test performance, and 
thus did not in itself have motivating effects. Undoubtedly there were some 
very high-ability students in the High-Ability group who perceived themselves 
as performing less well on the stradaptive test in comparison to their typical 
levels of ability test performance. But the possibly adverse effects for these 
students probably were balanced by the effects of the stradaptive test for some 
relatively low-ability students in the High-Ability group who, like most low- 
ability examinees, were pleasantly surprised by their levels of performance on 
the test. 

On the conventional test the percentage correct for High-Ability testees 
(58%) was substantially greater than that for Low-Ability students (43%) and 
their mean ability scores differed substantially. On the other hand, on the 
stradaptive test the percentages of correct responses obtained by the two groups 
were more similar (50% vs. 46.5%), and their mean ability scores were also 
doiser together. 

The results concerning the effects of testing strategy and KR conditions 
on the performance of examinees of different ability levels are particularly 
important because of their Implications for the measurement of ''maximum perfor- 
mance" levels. Standard testing conditions (i.e., conventional objective tests 
administered without provision of KR) did not elicit msnt^mtm levels of perfor- 
mance from either group of examinees studied. Modifications of testing condi- 
tions, specifically the provision of KR for high-ability examinees and either 
the provision of KR or the administration of an adaptive test for low-ability 
examinees, were found to lead to significantly higher levels of performance. 
Perhaps more Important, modifications of testing conditions usually assumed to 
elicit maximum levels of performance were found to reduce to insignificant 



levels, in some cases, score differences between two groups of supposedly 
different ability levels. 

Other Effects of KR 

The internal consistency reliability of the 50-item conventional test was 
not found to differ significantly as a function of KR conditions; reliability 
under KR conditions was .89, while that under No-KR conditions was .91. Thus, 
the data of the present study suggest that KR neither adds nor subtracts 
reliable variance in a set of ability scores. However, given the lack of other 
data relevant to this question, further study will be needed to delineate ex- 
actly what effects, if any, KR has on the precision of measurement. 

Mean response consistency scores on the stradaptive test were not found to 
differ as a function of KR conditions. Thus, while KR increased mean test 
scores, it did not appear to increase the consistency of examinees' response 
patterns. Similarly, response latency was not related to KR conditions. Thus, 
KR did not affect the speed with which test responses were made. 

The finding that there were no significant differences in mean response 
latency between the conventional and stradaptive tests differs from the findings 
reported by Waters (1975) . Waters found that mean response times on the stradap- 
tive tests were significantly longer than were response times on a conventional 
test; his interpretation of this finding was that examinees had to "think 
longer" about the answers to stradaptive test items because the items were se- 
lected to be at the limit of the examinee's ability level. In the present study, 
however, this appeared not to be the case. The differing result may have been 
due to characteristics of ^the test items or the testees. Further research on 
response latency differences between conventional and adaptive tests is indi- 
cated. 

KR also did not systematically affect performance on a conventional test 
administered following the experimental test. Analysis of the post- test number- 
correct scores indicated that mean post-test scores did not differ as a function 
of previous testing conditions — either KR or test administered (i.e., the con- 
ventional or the stradaptive) . From these data it may be concluded that the 
facilitating effects of KR on the experimental test did not transfer to perfor- 
mance on a test given subsequently under no-KR conditions. Further, if there 
were adverse reactions such as frustration to no longer receiving KR, these 
reactions did not affect performance to such a degree that examinees who had 
received KR obtained lower post-test scores than did those who had not received 
KR. 

«» 

It is als^o interesting to note that the group effect was highly significant 
on the post-test. High-ability examinees obtained significantly higher scores 
than did low-ability examinees. However, results from the experimental tests 
had shown that under some conditions of administration, the scores of high- and 
low-ability examinees were not significantly different. The conditions in 
effect on the post- test were similar to those of most testing situations; know- 
ledge of results was not provided and the test administered was a conventional 
rather than an adaptive test. Under these conditions, the group differences 
found corresponded to those expected on the basis of the previous levels of per- 
formance of the two groups. 
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KR also did not significantly affect the correlations of scores between 
the experimental test and a conventional post-test. Although these correlations 
were lower for the group of examinees who had recleved KR on the experimental 
test than for those who had not received KR, the differences between the 
correlations In the KR and no-KR groups were not statistically significant. 
Thus, although KR affected mean levels of test performance, effects for the 
total group were relatively constant across Individuals. Further research 
utilizing repeated measures designs should be directed at Investigating whether 
KR can result In significant Individual differences In ability test performance. 

Conclusions 

The results of the present study demonstrated that providing examinees 
with Immediate knowledge of results can lead to significant Increases In 
ability test scores. Thus, It appears that knowledge of results can Increase 
the extent to which ability tests measure the "maximum performance" capabilities 
of Individuals. However, further research Is needed to determine whether 
providing examinees with KR Increases the validity of test scores. 

In a group of low-ablllty examinees, test scores were higher on the 
stradaptlve test than they were on the conventional test. This suggests that 
adaptive testing may have motivational effects similar to 'those of Immediate 
knowledge of results, particularly for examinees for whom conventional 
tests are too difficult. 

Testing conditions had somewhat different effects on the performance 
levels of high- and low-ablllty examinees, and there were some conditions 
under which the expected group differences In test scores were not found. 
This result suggests that testing conditions may affect not only the conclusions 
made about Individuals on the basis of test scores, but the conclusions made 
about group differences In ability level. Therefore, In studying differences 
In psychological variables, more attention should be paid to the possible 
Impact on the obtained results of the conditions under which measurements 
are made. 

While knowledge of results can be provided on paper-and-pencll tests. Its 
provision Is, at best. Inefficient and unwleldly. Consequently, It Is not 
likely that providing KR will become standard In the administration of such 
tests. Further, most adaptive tests must be administered by computer. 
Studies of the few adaptive tests which can be administered by paper-and- 
pendl methods (e.g., the flexllevel test; Lord, 1971) have shown that signif- 
icant numbers of examinees fail to follow the branching directions properly 
and thus invalidate their test protocols (e.g., Olivier, 1974). 

It is evident that the administration of ability tests by computer 
provides psychological measurement with capabilities which have been either 
difficult or Impossible to Implement using paper-aiid-pencil testing methods. 
The facultative effects of both immediate knowledge of results and adaptive 
testing on ability test performance found in the present study support the use 
of computer-assisted testing procedures to provide measurements consistent 
with a maximum performance conceptualization of human abilities. 
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Table A-2 

Entry Strata for the Stradaptlve Test 
as a Function of Reported Grade-Point Average 



Grade-Point Entry Mean Difficulty Level 

Average Stratum of Entry Stratum 



3.76 


to 


4.00 


9 


2.63 


3.51 


to 


3.75 


8 


1.99 


3.26 


to 


3.50 


7 


1.36 


3.01 


to 


3.25 


6 


.59 


2.76 


to 


3.00 


5 


0 


2.51 


to 


2.75 


4 


- .63 


2.26 


to 


2.50 


3 


-1.30 


2.01 


to 


2.25 
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-1.99 


2.00 


or 


less 


1 


-2.67 
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Table A-3 

Normal Ogive Parameters for Items of the 
Peaked Conventional Test 
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Item 

Discrimination (fl) 
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Difficulty (&) 
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Mean 
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Table A-A 
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